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Lossless data compression has been studied for many NASA missions to achieve the benefit of 
increased science return; reduced onboard memory requirement, station contact time and commu- 
nication bandwidth. This paper first addresses the requirement for onboard applications and pro- 
vides rational for the selection of the Rice algorithm among other available techniques. A top- 
level description of the Rice algorithm will be given, along with some new capabilities already 
implemented in both software and hardware VLSI forms. The paper then addresses systems issues 
important for onboard implementation including sensor calibration, error propagation and data 
packetization. The latter part of the paper provides several case study examples drawn from a 
broad spectrum of science instruments including the thematic mapper, x-ray telescope, gamma- 
ray spectrometer, acousto-optical spectrometer. 


INTRODUCTION 

With the development of new advanced instruments for remote sensing applications, sensor data 
will be generated at a rate that not only requires increased onboard processing, storage capability, 
but imposes demands on the communication link and ground data management system. Data 
compression provides a viable means to alleviate these demands. Two types of data compression 
have been studied by many researchers in the area of information theory: a lossless technique that 
guarantees full reconstruction of the data, and a lossy technique which generally gives higher data 
compaction ratio but incurs distortion in the reconstructed data. To satisfy the many science disci- 
plines NASA supports, lossless data compression becomes the priority for technology develop- 
ment in this area. 

To implement a data compression technique on the spacecraft, several criteria are considered: 

1. The algorithm has to adapt to the changes in data to maximize performance. 

2. It can be easily implemented with few processing steps, small memory and little power. 

3. It can be easily interfaced with a packetized data system without performance degradation. 
There exist a few well known lossless compression techniques including Huffman code, arith- 
metic code, Ziv-Lempel algorithm and variants of each. After extensive study and performance 
comparison on the same test image data set (Venbrux, 92)(Yeh, 91, 93), the Rice algorithm origi- 
nated at Jet Propulsion Laboratories (Rice, 79) is selected for implementation. 


1. Part of the paper is taken from NASA Technical Paper 3441, “Application Guide for Universal Source 
Encoding for Space,” by the authors, Dec. 1993 and was presented in the International Geoscience and 
Remote Sensing Symposium, 94. 
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The Rice algorithm is essentially a set of Huffman codes organized in a structure that does not 
require lookup tables. The set of the Huffman codes can be easily extended to the information 
range of the science data. It is adaptive to the changes in the statistics of the data, and can be eas- 
ily implemented. The structure of the algorithm also permits simple interface to data packetiza- 
tion scheme without having to carry side information across packet boundary. Therefore its 
performance is file size independent. 

In 1991, a hardware engineering model was built in an Application Specific Integrated Circuit 
(ASIC) for proof of concept. This particular chip set was named as Universal Source Encoder/ 
Universal Source Decoder (USE/USD) (Venbrux, 92). Later, it was redesigned with several addi- 
tional capabilities and implemented in Very Large Scale Integration (VLSI) circuits using gate 
arrays suitable for space missions. The flight circuit is referred to as Universal Source Encoder for 
Space (USES). The fabricated USES chip is capable of processing data up to 20 Msamples/sec- 
ond and will take data of quantization from 4-bit to 15-bit (MRC, 93). 

A description of the Rice algorithm will be given in the next section, followed by systems issues 
and case study examples on remote sensing data either acquired from launched spacecrafts or 
simulated for future missions. 

THE RICE ALGORITHM ARCHITECTURE 

A block diagram of the architecture of the Rice algorithm (Rice, 91, 93) is given in Figure 1. It 
consists of a preprocessor to decorrelate data samples and subsequently map them into symbols 
suitable for the following stage of entropy coding. The entropy coding module is a collection of 
options operating in parallel over a large entropy range. The option yielding the least number of 
coding bits will be selected. This selection is performed over a block of J samples to achieve 
adaptability to scene statistics. An Identification (ID) bit pattern is used to identify the option 
selected for each block of J input data. 
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The Preprocessor 

The predictor in the preprocessor can be as simple as a first order predictor using previous sample, 
or other higher order predictors. To maintain the pipeline processing in the hardware, only a few 
predictor types are implemented, these include: a ID predictor, a 2D predictor, a multispectral 
predictor and a user-supplied external predictor. 

The function of the predictive coder is to decorrelate the incoming data stream by taking the dif- 
ference between data symbols. The mapper takes these difference values, both positive and nega- 
tive, and orders them, based on predictive values, sequentially into positive integers. 

Entropy Coder 

Most of the options in the entropy coder are called “sample-split options”. These options take a 
block of J preprocessed data samples, split off the k least significant bits, and code the remaining 
higher order bits with a simple comma code before appending the split-off bits to the coded data 
stream. Each sample-split option in the Rice algorithm is optimal in an entropy range about one 
bit/sample (Yeh, 93); only the one yielding the least amount of coding bits will be chosen and 
identified for a J-sample data block by the option select logic. This assures that the block will be 
coded with one of the available Huffman codes, whose performance is better than other available 
options on the same block of data. The k = 0 option is optimal in the entropy range of 1.5 - 2.5 bit/ 
sample; the k = 1 option is optimal in the range of 2.5 - 3.5 bit/sample, and so on for other k val- 
ues. 

To improve the performance below 1 bit/sample, a new option is devised and included in the full 
set of options implemented in VLSI. This new option is particularly efficient over data with very 
low entropy values. 

The default option is an option not to use any of the split-sample options or the low-entropy 
option. It bounds the performance of the algorithm by simply passing through the preprocessed 
block of data through the encoder without alteration but with an appended identifier. 

SYSTEMS ISSUES 

Several systems issues related to embedding data compression scheme onboard a spacecraft 
should be addressed. These include the relation between the focal-plane array arrangement and 
the data sampling/prediction direction, the subsequent data packetizing scheme and how it relates 
to error propagation in case of bit error incurred in the communication channel and how the pack- 
etization may affect compression performance. 

Sensor Calibration 

Advanced imaging instruments and spectrometers often use arrays of individual detectors 
arranged in a ID or 2D configuration; one example is the Charge Coupled Device (CCD). These 
individual detectors tend to have slight differences in response to the same input photon intensity. 
For instance, CCDs usually have a different gain and dark current value for each individual detec- 
tor element. It is important to have the sensor well calibrated so that the data reflects the actual 
signals received by the sensor. Simulations have shown that for CCD types of sensors, gain and 
dark current variations as small as 0.2% of the full dynamic range can render the prediction 
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scheme less effective for data compression. Besides calibration, in order to maximize the com- 
pression gain, the ID prediction scheme will be much more effective when data acquired on one 
detector element is used as the predictor for data acquired on the same detector whose characteris- 
tics are stationary, in general, over the data collection period. 

Error Propagation 

User acceptability to distortion resulting from channel bit errors is mission dependent, and is 
strongly a function of received Signal-to-Noise-Ratio (SNR), the data format, the error detection 
and correction technique and the percent of distorted data that is tolerable. A major concern in 
using data compression is the possibility of error propagation in the event of a single bit error in 
the compressed data stream. During the decompression process, a single bit error can lead to a 
reconstruction error over extended runs of data points. A general approach to minimizing this 
effect is to provide a very clean channel by using error detection/correction scheme. For com- 
pressed data stream, this still does not prevent error propagation, if it occurs, across a decom- 
pressed scanline. Further protection can be achieved by using packet data structure in conjunction 
with a properly chosen error correction scheme as advocated in the Consultative Committee on 
Space Data Systems (CCSDS) Blue Book (CCSDS, 89). Using this scheme, decompression error 
resulting from bit-error will then be contained in a packet for compression algorithms that do not 
carry side information across packet boundary. 

Packetization and Compression Performance 

Packetization is used not only as a means to contain bit error locally as just mentioned, it is a log- 
ical way to facilitate the transport of variable length bit string as a result of entropy coding. The 
Rice algorithm chooses an option for every block of samples, its performance is optimized within 
this block and there is no need to pass side information or statistics across packet boundary. There 
exist other compression techniques whose performance depends on establishing long-term statis- 
tics in a file. These schemes will give good compression performance for a large file and poor per- 
formance for a relatively smaller file. When packetization is used in conjunction with these 
algorithms to prevent error propagation, one would expect better compression performance for a 
larger packet. The drawback is that the loss of data caused by bit error may be intolerable. 

CASE STUDY EXAMPLES 

This section contains several compression study results for several different instruments. The 
compression performance is expressed as Compression Ratio (CR). It is defined as the ratio of the 
quantization level in bits to the average code word length, also in bits. It should be noted that the 
CR value is data dependent and can vary from one test data set to the next. 

Landsat Thematic Mapper 

Mission Purpose : The Landsat program was initiated for the study of Earth’s surface and 
resources. Landsat- 1, 2, and 3 were launched between 1972 and 1978. Landsat-4 was launched in 
1982, and Landsat-5 in 1984. 

Landsat Thematic Mapper (TM) on Landsat-4 and 5: The TM data represent typical land observa- 
tion data. An image acquired on Landsat-4 at 30m ground resolution for band 1 in the wavelength 
region of 0.45 - 0.52 p.m is shown in Figure 2. This 8-bit 512x512 image was taken over Sierra 
Nevada in California. 
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Compression Study: Using a ID predictor in the horizontal direction, setting a block size of 16 
samples and inserting one reference per every image line, the lossless compression gives a com- 
pression ratio at 1.83 for the 8-bit image. 



Figure 2. Thematic Mapper image 


Soft X-ray Telescope (SXT) on Solar-A Mission 

Mission Purpose: The Solar-A mission, renamed as Yohkoh mission after its successful launch in 
August, 1991, is dedicated to the study of solar flares, especially of high-energy phenomena 
observed in the X- and gamma-ray ranges. 

Soft X-ray Telescope (SXT): The instrument detects X-ray in the wavelength range of 3-60 Ang- 
strom. It uses a 1024x1024 CCD detector array to cover the whole Solar disk. Data acquired from 
the CCD is of 12-bit quantization and is processed on board to provide 8-bit telemetry data. The 
image in Figure 3 is an averaged image of size 512x512 with dynamic range up to 15 bits in float- 
ing point format as a result of further ground processing. 

Compression Study: The test image is first rounded to the nearest integer. Then a ID predictor is 
applied to this seemingly high-contrast image. A compression ratio of 4.69 is achieved. 
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Figure 3. Solar-A X-ray image 


Acousto-Optical Spectrometer (AOS) on Submillimeter Wave Astronomy Satellite (SWAS) 

Mission Purpose: The Submillimeter Wave Astronomy Satellite (SWAS) is a Small Explorer 
(SMEX) mission, scheduled for launch in the summer of 1995 aboard a Pegasus launcher. The 
objective of the SWAS is to study the energy balance and physical conditions of the molecular 
clouds in the Galaxy by observing the radio- wave spectrum specific to certain molecules. 

Acousto-Optical Spectrometer: The AOS utilizes a Bragg cell to convert the radio frequency 
energy from the SWAS submillimeter receiver into an acoustic wave, which then diffracts a laser 
beam onto a CCD array. The sensor has 1450 elements with 16-bit readout. A typical spectrum is 
shown in Figure 4(a). An expanded view of a portion of two spectral traces is given in Figure 
4(b). Because of the detector nonuniformity, the difference in the Analog-to-Digital Converter 
(ADC) gain between even-odd channels, and effects caused by temperature variations, the spectra 
have nonuniform offset values between traces, in addition to the saw-tooth-shaped variation 
between samples within a trace. Because of limited available onboard memory, a compression 
ratio of over 2: 1 is required for this mission 

Compression Study: ID prediction between samples is ineffective when the odd and even chan- 
nels have different ADC gains. Using the multispectral predictor mode, the achievable CR is 2.32. 
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Figure 4. AOS radio wave spectrum 

Gamma-Ray Spectrometer on Mars Observer 

Mission Purpose: The Mars Observer was launched in September 1992. The Observer will collect 
data through several instruments to help the scientists understand the Martian surface, atmo- 
spheric properties and the interactions between the various elements involved. In the summer of 
93, contact with the spacecraft was lost. 

Gamma-Ray Spectrometer (GRS): The spectrometer uses a high-purity germanium detector for 
gamma rays. The flight spectrum is collected over sixteen thousand channels. The total energy 
range of a spectrum extends from 0.2 Mev to 10 Mev. Typical spectra for a 5-second and a 50- 
second collection time are given in Figure 5. These spectra show the random nature of the count. 
The spectral count dynamic range is 8-bit. 

Compression Study. The achievable compression depends on the channel collection time. At 5- 
second collection time CR is over 20 and it decreases as collection increases. At 20-second col- 
lection time, CR is over 10. 
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Figure 5. Gamma-ray spectrum 
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CONCLUSION 

A lossless data compression technology has been successfully developed for remote sensing 
applications. This technology is based on the enhanced Rice algorithm. The performance of the 
algorithm has been established through analysis and simulation. Hardware in VLSI form as well 
as software are currently available for space flight missions. Over a dozen case studies have been 
performed on post-flight data and several new missions have adopted the technology for onboard 
implementation. 
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